Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees
نویسندگان
چکیده
We analyze the expected risk of linear classifiers for a fixed weight vector in the “minimax” setting. That is, we analyze the worst-case risk among all data distributions with a given mean and covariance. We provide a simpler proof of the tight polynomial-tail bound for general random variables. For sub-Gaussian random variables, we derive a novel tight exponentialtail bound. We also provide new PAC-Bayes finite-sample guarantees when training data is available. Our “minimax” generalization bounds are dimensionality-independent and O( p 1/m) for m samples.
منابع مشابه
PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers
We propose a PAC-Bayes theorem for the sample-compression setting where each classifier is described by a compression subset of the training data and a message string of additional information. This setting, which is the appropriate one to describe many learning algorithms, strictly generalizes the usual data-independent setting where classifiers are represented only by data-independent message...
متن کاملRisk Bounds for Randomized Sample Compressed Classifiers
We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also r...
متن کاملA PAC-Bayes Sample-compression Approach to Kernel Methods
We propose a PAC-Bayes sample compression approach to kernel methods that can accommodate any bounded similarity function and show that the support vector machine (SVM) classifier is a particular case of a more general class of data-dependent classifiers known as majority votes of samplecompressed classifiers. We provide novel risk bounds for these majority votes and learning algorithms that mi...
متن کاملDimensionality Dependent PAC-Bayes Margin Bound
Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality indepen...
متن کاملChromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes
PAC-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifie...
متن کامل